Julia Schulte-Cloos
University of Marburg
November 20, 2024
Thomas Lin Pedersen 2020 (celebRation2020)
Thomas Lin Pedersen 2020 (celebRation2020)
Thomas Lin Pedersen 2020 (celebRation2020)
Thomas Lin Pedersen 2020 (celebRation2020)
Cedric Scherer 2023 (Posit Conference)
tidyverse techniques to prepare data for optimal plotting format\(\rightarrow\) pass data to ggplot
x, y: variable along the x and y axiscolour: color of geoms according to datafill: the inside color of the geomgroup: what group a geom belongs toshape: the symbol used to plot a pointlinetype: the type of line used (solid, dashed, etc)size: size scaling for an extra dimensionalpha: the transparency of the geomgeom_point(): scatterplotgeom_line(): lines connecting points by increasing value of xgeom_path(): lines connecting points in sequence of appearancegeom_boxplot(): box and whiskers plot for categorical variablesgeom_bar(): bar charts for categorical x axisgeom_histogram(): histogram for continuous x axisgeom_violin(): distribution kernel of data dispersiongeom_smooth(): function line based on datageom_ is a shortcut for a function called layerstat_*() or geom_*()stat = 'identity'\(\rightarrow\) translate back and forth between variable ranges and property ranges
scale_x_continuous); special case of continuous scales: date scales (scale_x_date) vs. discrete position scalesscale_fill_viridis_c(); scale_fill_distiller()), discrete color scales (scale_fill_viridis_d(); scale_fill_brewer())paletteer acting as a common interface for these different packages)ggplot(bikes, aes(x = temp_feel, y = count)) +
# color mapping only applied to points
geom_point(aes(color = day_night)) +
# invisible grouping to create two trend lines
stat_smooth(aes(group = day_night)) +
scale_color_viridis_d() +
# x axis
scale_x_continuous(
# add °C symbol
labels = function(x) paste0(x, "°C"),
# use 5°C spacing
breaks = -1:6*5 # also: seq(-5, 30, by = 5)
) +
# y axis
scale_y_continuous(
# add a thousand separator
labels = scales::label_comma(),
# use consistent spacing across rows
breaks = 0:5*10000
)facet_wrap() and facet_grid()ggplot(bikes, aes(x = temp_feel, y = count)) +
geom_point(
aes(color = season),
alpha = .5, size = 1.5
) +
stat_smooth(
method = "lm", color = "black"
) +
scale_color_viridis_d(
# overwrite legend keys
labels = c("Winter", "Spring", "Summer", "Autumn")
) +
# x axis
scale_x_continuous(
# add °C symbol
labels = function(x) paste0(x, "°C"),
# use 5°C spacing
breaks = -1:6*5 # also: seq(-5, 30, by = 5)
) +
# y axis
scale_y_continuous(
# add a thousand separator
labels = scales::label_comma(),
# use consistent spacing across rows
breaks = 0:5*10000
) +
# small multiples
facet_wrap(facets = vars(day_night)) # also: ~ day_nightplot <- ggplot(bikes, aes(x = temp_feel, y = count)) +
geom_point(
aes(color = season),
alpha = .5, size = 1.5
) +
stat_smooth(
method = "lm", color = "black"
) +
scale_color_viridis_d(
# overwrite legend keys
labels = c("Winter", "Spring", "Summer", "Autumn")
) +
# x axis
scale_x_continuous(
# add °C symbol
labels = function(x) paste0(x, "°C"),
# use 5°C spacing
breaks = -1:6*5 # also: seq(-5, 30, by = 5)
) +
# y axis
scale_y_continuous(
# add a thousand separator
labels = scales::label_comma(),
# use consistent spacing across rows
breaks = 0:5*10000
) +
facet_grid(
rows = vars(day_night),
cols = vars(year),
# free y axis range
scales = "free_y",
# scale heights proportionally to length of y scale
space = "free_y"
) +
labs(
# overwrite axis and legend titles
x = "Average feels-like temperature", y = NULL, color = NULL,
# add plot title and caption
title = "Trends of Reported Bike Rents versus Feels-Like Temperature in London",
caption = "Data: TfL (Transport for London), Jan 2015–Dec 2016"
)ggplot2 extensions with ready-built themes, e.g.:
{ggdark}{ggsci} (also color scales){ggtech} (also color scales){ggthemes} (also color scales){ggthemr}{hrbrthemes} (also color scales){tvthemes} (also color scales)plot +
theme_light(base_size = 11, base_family = "Spline Sans") +
# theme adjustments
theme(
plot.title.position = "plot", # left-align title
plot.caption.position = "plot", # right-align caption
legend.position = "top", # place legend above plot
plot.title = element_text(face = "bold", size = rel(1.2)), # larger, bold title
axis.text = element_text(family = "Spline Sans Mono"), # monospaced font for axes
axis.title.x = element_text( # left-aligned, grey x axis label
hjust = 0, color = "grey20", margin = margin(t = 12)
),
legend.text = element_text(size = rel(1)), # larger legend labels
strip.text = element_text(face = "bold", size = rel(1.15)), # larger, bold facet labels
panel.grid.major.x = element_blank(), # no vertical major lines
panel.grid.minor = element_blank(), # no minor grid lines
panel.spacing.x = unit(20, "pt"), # increase white space between panels
panel.spacing.y = unit(10, "pt"), # increase white space between panels
plot.margin = margin(rep(15, 4)) # adjust white space around plot
)NA values)coord_polar(): polar coordinate system interprets x and y as radius and angleWork through the 02_foundations_exercises.qmd together with your neighbor, running the code yourself and trying to understand the syntax behind the Grammar of Graphics.
➡️ For each of the chart types involved (e.g. bar chart, scatter plot, etc.) consult the excellent book Fundamentals of Data Visualisation by Claus Wilke to develop an intuition of good and bad practices of each type of chart.
➡️ You may also want to consult the book’s source code to check out the code behind the best practice examples.
60:00
Rely on some data that is relevant to your field and subject of your PhD. Try to visualize some of the data using the Grammar of Graphics ggplot2 framework. You may want to consider visualizing distributions, proportions, associations, and uncertainty.
🤓 Try to incorporate the “Principle of Proportional Ink” into your visualizations.
➡️ If you feel unsure about a potential dataset to look at, explore the bike sharing data set introduced earlier and challenge yourself by completing Exercise 2.
60:00